14 research outputs found

    Digital Object Cloud for linking natural science collections information; The case of DiSSCo

    Get PDF
    DiSSCo (The Distributed System of Scientific Collections) is a Research Infrastructure (RI) aiming at providing unified physical (transnational), remote (loans) and virtual (digital) access to the approximately 1.5 billion biological and geological specimens in collections across Europe. DiSSCo represents the largest ever formal agreement between natural science museums (114 organisations across 21 European countries). With political and financial support across 14 European governments and a robust governance model DiSSCo will deliver, by 2025, a series of innovative end-user discovery, access, interpretation and analysis services for natural science collections data. As part of DiSSCo's developing data model, we evaluate the application of Digital Objects (DOs), which can act as the centrepiece of its architecture. DOs have bit-sequences representing some content, are identified by globally unique persistent identifiers (PIDs) and are associated with different types of metadata. The PIDs can be used to refer to different types of information such as locations, checksums, types and other metadata to enable immediate operations. In the world of natural science collections, currently fragmented data classes (inter alia genes, traits, occurrences) that have derived from the study of physical specimens, can be re-united as parts in a virtual container (i.e., as components of a Digital Object). These typed DOs, when combined with software agents that scan the data offered by repositories, can act as complete digital surrogates of the physical specimens. In this paper we: 1. investigate the architectural and technological applicability of DOs for large scale data RIs for bio- and geo-diversity, 2. identify benefits and challenges of a DO approach for the DiSSCo RI and 3. describe key specifications (incl. metadata profiles) for a specimen-based new DO type

    FAIR data and services in biodiversity science and geoscience

    Get PDF
    We examine the intersection of the FAIR principles (Findable, Accessible, Interoperable and Reusable), the challenges and opportunities presented by the aggregation of widely distributed and heterogeneous data about biological and geological specimens, and the use of the Digital Object Architecture (DOA) data model and components as an approach to solving those challenges that offers adherence to the FAIR principles as an integral characteristic. This approach will be prototyped in the Distributed System of Scientific Collections (DiSSCo) project, the pan-European Research Infrastructure which aims to unify over 110 natural science collections across 21 countries. We take each of the FAIR principles, discuss them as requirements in the creation of a seamless virtual collection of bio/geo specimen data, and map those requirements to Digital Object components and facilities such as persistent identification, extended data typing, and the use of an additional level of abstraction to normalize existing heterogeneous data structures. The FAIR principles inform and motivate the work and the DO Architecture provides the technical vision to create the seamless virtual collection vitally needed to address scientific questions of societal importance

    Conceptual design blueprint for the DiSSCo digitization infrastructure - DELIVERABLE D8.1

    Get PDF
    DiSSCo, the Distributed System of Scientific Collections, is a pan-European Research Infrastructure (RI) mobilising, unifying bio- and geo-diversity information connected to the specimens held in natural science collections and delivering it to scientific communities and beyond. Bringing together 120 institutions across 21 countries and combining earlier investments in data interoperability practices with technological advancements in digitisation, cloud services and semantic linking, DiSSCo makes the data from natural science collections available as one virtual data cloud, connected with data emerging from new techniques and not already linked to specimens. These new data include DNA barcodes, whole genome sequences, proteomics and metabolomics data, chemical data, trait data, and imaging data (Computer-assisted Tomography (CT), Synchrotron, etc.), to name but a few; and will lead to a wide range of end-user services that begins with finding, accessing, using and improving data. DiSSCo will deliver the diagnostic information required for novel approaches and new services that will transform the landscape of what is possible in ways that are hard to imagine today. With approximately 1.5 billion objects to be digitised, bringing natural science collections to the information age is expected to result in many tens of petabytes of new data over the next decades, used on average by 5,000 – 15,000 unique users every day. This requires new skills, clear policies and robust procedures and new technologies to create, work with and manage large digital datasets over their entire research data lifecycle, including their long-term storage and preservation and open access. Such processes and procedures must match and be derived from the latest thinking in open science and data management, realising the core principles of 'findable, accessible, interoperable and reusable' (FAIR). Synthesised from results of the ICEDIG project ('Innovation and Consolidation for Large Scale Digitisation of Natural Heritage', EU Horizon 2020 grant agreement No. 777483) the DiSSCo Conceptual Design Blueprint covers the organisational arrangements, processes and practices, the architecture, tools and technologies, culture, skills and capacity building and governance and business model proposals for constructing the digitisation infrastructure of DiSSCo. In this context, the digitisation infrastructure of DiSSCo must be interpreted as that infrastructure (machinery, processing, procedures, personnel, organisation) offering Europe-wide capabilities for mass digitisation and digitisation-on-demand, and for the subsequent management (i.e., curation, publication, processing) and use of the resulting data. The blueprint constitutes the essential background needed to continue work to raise the overall maturity of the DiSSCo Programme across multiple dimensions (organisational, technical, scientific, data, financial) to achieve readiness to begin construction. Today, collection digitisation efforts have reached most collection-holding institutions across Europe. Much of the leadership and many of the people involved in digitisation and working with digital collections wish to take steps forward and expand the efforts to benefit further from the already noticeable positive effects. The collective results of examining technical, financial, policy and governance aspects show the way forward to operating a large distributed initiative i.e., the Distributed System of Scientific Collections (DiSSCo) for natural science collections across Europe. Ample examples, opportunities and need for innovation and consolidation for large scale digitisation of natural heritage have been described. The blueprint makes one hundred and four (104) recommendations to be considered by other elements of the DiSSCo Programme of linked projects (i.e., SYNTHESYS+, COST MOBILISE, DiSSCo Prepare, and others to follow) and the DiSSCo Programme leadership as the journey towards organisational, technical, scientific, data and financial readiness continues. Nevertheless, significant obstacles must be overcome as a matter of priority if DiSSCo is to move beyond its Design and Preparatory Phases during 2024. Specifically, these include: Organisational: Strengthen common purpose by adopting a common framework for policy harmonisation and capacity enhancement across broad areas, especially in respect of digitisation strategy and prioritisation, digitisation processes and techniques, data and digital media publication and open access, protection of and access to sensitive data, and administration of access and benefit sharing. Pursue the joint ventures and other relationships necessary to the successful delivery of the DiSSCo mission, especially ventures with GBIF and other international and regional digitisation and data aggregation organisations, in the context of infrastructure policy frameworks, such as EOSC. Proceed with the explicit aim of avoiding divergences of approach in global natural science collections data management and research. Technical: Adopt and enhance the DiSSCo Digital Specimen Architecture and, specifically as a matter of urgency, establish the persistent identifier scheme to be used by DiSSCo and (ideally) other comparable regional initiatives. Establish (software) engineering development and (infrastructure) operations team and direction essential to the delivery of services and functionalities expected from DiSSCo such that earnest engineering can lead to an early start of DiSSCo operations. Scientific: Establish a common digital research agenda leveraging Digital (extended) Specimens as anchoring points for all specimen-associated and -derived information, demonstrating to research institutions and policy/decision-makers the new possibilities, opportunities and value of participating in the DiSSCo research infrastructure. Data: Adopt the FAIR Digital Object Framework and the International Image Interoperability Framework as the low entropy means to achieving uniform access to rich data (image and non-image) that is findable, accessible, interoperable and reusable (FAIR). Develop and promote best practice approaches towards achieving the best digitisation results in terms of quality (best, according to agreed minimum information and other specifications), time (highest throughput, fast), and cost (lowest, minimal per specimen). Financial Broaden attractiveness (i.e., improve bankability) of DiSSCo as an infrastructure to invest in. Plan for finding ways to bridge the funding gap to avoid disruptions in the critical funding path that risks interrupting core operations; especially when the gap opens between the end of preparations and beginning of implementation due to unsolved political difficulties. Strategically, it is vital to balance the multiple factors addressed by the blueprint against one another to achieve the desired goals of the DiSSCo programme. Decisions cannot be taken on one aspect alone without considering other aspects, and here the various governance structures of DiSSCo (General Assembly, advisory boards, and stakeholder forums) play a critical role over the coming years

    The Bari Manifesto : An interoperability framework for essential biodiversity variables

    Get PDF
    Essential Biodiversity Variables (EBV) are fundamental variables that can be used for assessing biodiversity change over time, for determining adherence to biodiversity policy, for monitoring progress towards sustainable development goals, and for tracking biodiversity responses to disturbances and management interventions. Data from observations or models that provide measured or estimated EBV values, which we refer to as EBV data products, can help to capture the above processes and trends and can serve as a coherent framework for documenting trends in biodiversity. Using primary biodiversity records and other raw data as sources to produce EBV data products depends on cooperation and interoperability among multiple stakeholders, including those collecting and mobilising data for EBVs and those producing, publishing and preserving EBV data products. Here, we encapsulate ten principles for the current best practice in EBV-focused biodiversity informatics as 'The Bari Manifesto', serving as implementation guidelines for data and research infrastructure providers to support the emerging EBV operational framework based on trans-national and cross-infrastructure scientific workflows. The principles provide guidance on how to contribute towards the production of EBV data products that are globally oriented, while remaining appropriate to the producer's own mission, vision and goals. These ten principles cover: data management planning; data structure; metadata; services; data quality; workflows; provenance; ontologies/vocabularies; data preservation; and accessibility. For each principle, desired outcomes and goals have been formulated. Some specific actions related to fulfilling the Bari Manifesto principles are highlighted in the context of each of four groups of organizations contributing to enabling data interoperability - data standards bodies, research data infrastructures, the pertinent research communities, and funders. The Bari Manifesto provides a roadmap enabling support for routine generation of EBV data products, and increases the likelihood of success for a global EBV framework.Peer reviewe

    The role of natural science collections in the biomonitoring of environmental contaminants in apex predators in support of the EU's zero pollution ambition

    Get PDF
    The chemical industry is the leading sector in the EU in terms of added value. However, contaminants pose a major threat and significant costs to the environment and human health. While EU legislation and international conventions aim to reduce this threat, regulators struggle to assess and manage chemical risks, given the vast number of substances involved and the lack of data on exposure and hazards. The European Green Deal sets a 'zero pollution ambition for a toxic free environment' by 2050 and the EU Chemicals Strategy calls for increased monitoring of chemicals in the environment. Monitoring of contaminants in biota can, inter alia: provide regulators with early warning of bioaccumulation problems with chemicals of emerging concern; trigger risk assessment of persistent, bioaccumulative and toxic substances; enable risk assessment of chemical mixtures in biota; enable risk assessment of mixtures; and enable assessment of the effectiveness of risk management measures and of chemicals regulations overall. A number of these purposes are to be addressed under the recently launched European Partnership for Risk Assessment of Chemicals (PARC). Apex predators are of particular value to biomonitoring. Securing sufficient data at European scale implies large-scale, long-term monitoring and a steady supply of large numbers of fresh apex predator tissue samples from across Europe. Natural science collections are very well-placed to supply these. Pan-European monitoring requires effective coordination among field organisations, collections and analytical laboratories for the flow of required specimens, processing and storage of specimens and tissue samples, contaminant analyses delivering pan-European data sets, and provision of specimen and population contextual data. Collections are well-placed to coordinate this. The COST Action European Raptor Biomonitoring Facility provides a well-developed model showing how this can work, integrating a European Raptor Biomonitoring Scheme, Specimen Bank and Sampling Programme. Simultaneously, the EU-funded LIFE APEX has demonstrated a range of regulatory applications using cutting-edge analytical techniques. PARC plans to make best use of such sampling and biomonitoring programmes. Collections are poised to play a critical role in supporting PARC objectives and thereby contribute to delivery of the EU's zero-pollution ambition.Non peer reviewe

    FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units

    Get PDF
    Data science is facing the following major challenges: (1) developing scalable cross-disciplinary capabilities, (2) dealing with the increasing data volumes and their inherent complexity, (3) building tools that help to build trust, (4) creating mechanisms to efficiently operate in the domain of scientific assertions, (5) turning data into actionable knowledge units and (6) promoting data interoperability. As a way to overcome these challenges, we further develop the proposals by early Internet pioneers for Digital Objects as encapsulations of data and metadata made accessible by persistent identifiers. In the past decade, this concept was revisited by various groups within the Research Data Alliance and put in the context of the FAIR Guiding Principles for findable, accessible, interoperable and reusable data. The basic components of a FAIR Digital Object (FDO) as a self-contained, typed, machine-actionable data package are explained. A survey of use cases has indicated the growing interest of research communities in FDO solutions. We conclude that the FDO concept has the potential to act as the interoperable federative core of a hyperinfrastructure initiative such as the European Open Science Cloud (EOSC)

    Making small data big : What can Scratchpads do for you?

    No full text
    <p>Eight case studies from the 550 communities using Scratchpads.</p> <p>Scratchpads is an open source and free to use platform that enables you to work in a collaborative online environment. </p> <p>With a Scratchpad you can easily create your own website to structure, manage, link and publish your biodiversity data.</p

    ‘The Last Mile’: The registry behind the identifier [Conference Abstract]

    No full text
    Preserved specimens in natural science collections have lifespans of many decades and often, several hundreds of years. Specimens must be unambiguously identifiable and traceable in the face of changes in physical location, changes in organisation of the collection to which they belong, and changes in classification. When digitizing museum collections, a clear link must be maintained between the physical specimen itself and the information digitally representing that specimen in cyberspace. The idea of a Natural Science Identifier (NSId) as a neutral, unique, universal and stable long-term persistent identifier (PID) of a ‘Digital Specimen’ is central to museums’ ambitions for widening access. An NSId allows easy identification and referencing of specific Digital Specimens, regardless of type, location, owner or user. It provides a digital doorway to physical specimens through which services for arranging loans and visits can be accessed, as well as opening the door to innovative services for manipulating specimens’ information directly; for work reliant upon discovery of related third-party information; and for demanding 3D modelling and visualization of specimens. Because the work takes place within e-Infrastructures/Cyberspace, new possibilities for analysing hundreds of thousands of specimens simultaneously are opened by exploiting large-scale cloud computing capacity and deep mining/machine learning, for example. There are several established identifier mechanisms that could be used as a basis for NSId, but some variant of Handles is most appropriate over the very long-term because of their neutrality, resistance to change and sustainability. Adopted uses of the Handle system include identification of journal articles and datasets in education and research (using Digital Object Identifiers); film and television programme assets in the entertainment sector; financial derivatives; and for international shipping and construction. Aside from being stable and sustained over time, an essential requirement of a global PID mechanism is independence from the museums/institutions assigning identifiers. NSIds are opaque insofar as no information can or should be inferred solely by inspecting the identifier. Stakeholders change, collections move, and organisations evolve, merge or disappear. Even designations and descriptions of specimens and collections can change. Information should only be revealed when the identifier is resolved via a neutral index. One can debate the most appropriate instantiation of the Handle system but this is not useful. Relevance, ease of use and added-value of the supporting ‘NSId Registry’ (NSIdR) – the index of the different kinds of natural science object and their relations – are the decisive factors. This can be seen from the example of the Entertainment Identifier Registry (EIDR) founded by the major motion picture studios to create a reliable way to identify and track film and TV content distribution. Focus on the object model, promotional branding and value perception in the target user segment are the critical factors for success. Providing such a registry, seamlessly coupled to work practices and language of the professionals addresses the last mile challenge (Koureas et al. 2016). From specimens, class characteristics, storage containers and collections, to specific identifications, images, naming, literature references and more, the NSIdR’s triple-hierarchy object model, rooted in OBO Foundry’s Biological Collections Ontology, is the key to persistently identifying, relating and indexing the entire range of collection objects of interest to scientists and others working in the bio and geo realms. The NSIdR ‘knowledge graph’, interoperable with other identifier schemes, supports novel first- and third-party value-add services such as arranging loans and visits, curation and annotation, and machine-learning for relationship discovery and pattern exploration
    corecore